Back

Journal of Molecular Evolution

Springer Science and Business Media LLC

Preprints posted in the last 30 days, ranked by how well they match Journal of Molecular Evolution's content profile, based on 21 papers previously published here. The average preprint has a 0.01% match score for this journal, so anything above that is already an above-average fit.

1
Identification of a Third Period-tuning Site in Cyanobacterial Clock Protein KaiC

Horiuchi, K.; Furuike, Y.; Ito-Miwa, K.; Onoue, Y.; Akiyama, S.

2026-05-14 biochemistry 10.64898/2026.05.11.724173 medRxiv
Top 0.1%
3.1%
Show abstract

KaiC, a clock protein in cyanobacteria, cycles between dephosphorylated and phosphorylated states in a 24-hour period in the presence of KaiA and KaiB. We identified the 322nd residue of KaiC as a third example of period-tuning sites. 322nd-site-directed saturation mutagenesis resulted in a variety of KaiC mutants exhibiting either shortened or lengthened cycles. The tunable range of the periods was from approximately 11 to 78 h without significantly compromising temperature compensation. We conducted biochemical analyses of the 322nd variants and examined their predicted structural models. In contrast to another known period-tuning site, where the period decreases sharply as the side-chain volume increases due to mutations, the cycle lengths correlate only modestly with bulkiness at the 322nd residues. The 322nd residue is located in a C-terminal domain of KaiC and influences ATPase cycles in both the C-terminal domain and an N-terminal domain through its interaction with a flexible loop connecting the two domains. The structural models predict that placing less bulky but polar side chains, such as serine and threonine, at the 322nd position leads to the formation of a hydrogen-bonding network between that site and the loop. This reduces the mobility of the loop, resulting in the longer cycles due to decreases in the ATPase activity of the N-terminal domain. Conversely, placing bulky residues such as phenylalanine at the 322nd position appears to alter the loop structure, shortening the periods by enhancing the ATP activities of both the domains. The third period-tuning mechanism is distinct from other known mechanisms. Significance StatementA Kai-protein clock system serves as a model for studying how long circadian rhythms are achieved. We identified the 322nd residue of KaiC as a third example of period-tuning sites that allow tuning of the period in either long- and short-period directions. The third period-tuning mechanism differs from the two previously known types in several respects. Previous studies have suggested that the ATPase activity in an N-terminal domain of KaiC is the primary regulator of the period. On the other hand, the 322nd residues of KaiC can affect the period by activating the ATPase cycle in its C-terminal domain. Our findings will stimulate future studies on the period-tuning mechanism mediated by the ATPase activity in the C-terminal domain of KaiC.

2
Multiple molecular and cellular properties jointly affect protein and site-specific evolutionary rates

Saini, A.; Usmanova, D. R.; Supo Escalante, R.; Vitkup, D.

2026-05-23 evolutionary biology 10.64898/2026.05.20.726710 medRxiv
Top 0.1%
2.1%
Show abstract

Protein evolutionary rates vary widely across proteins and among sites within proteins, reflecting multiple molecular, cellular, and functional constraints. While protein-level properties, such as expression and essentiality, and site-level structural and functional constraints, are known to influence evolutionary rates, how these constraints combine across scales to determine site-specific evolutionary rates remains unclear. Moreover, because many protein features are strongly correlated, it is difficult to disentangle their individual contributions to evolutionary rate variance, and unified predictive models that integrate these properties are still lacking. Here, we use neural networks to predict protein evolutionary rates across multiple scales based on multiple molecular and cellular features. At the protein level, integrating molecular and cellular descriptors explains substantial variance in evolutionary rates across proteins in multiple eukaryotic species, including nearly 50% of the variance in humans and substantial fractions of the variance in other eukaryotic species. The model also allows us to identify proteins whose evolutionary rates deviate from expectations based on their molecular and cellular properties. At the site level, we found that structural and functional features explain a comparable fraction of the variance in relative evolutionary rates. By integrating protein-level and site-level predictors, the model explains up to 37% of the variance in site-specific evolutionary rates across proteins. Our analysis demonstrates that constraints at these two scales combine largely additively, with protein-level properties setting the overall evolutionary context and site-level properties shaping variation within proteins. Together, these results provide a quantitative framework for understanding protein evolution across biological scales.

3
In silico restriction site analysis of whole genome sequences shows patterns caused by selection and sequence duplications

Vedder, L.; Schoof, H.

2026-05-16 genomics 10.64898/2026.05.15.725336 medRxiv
Top 0.2%
1.5%
Show abstract

Biological sequences are known to be not random. Thus, the comparison of in silico restriction fragment distributions of random and biological sequences may be an indicator of this non-randomness. Our analyses show that for most of the tested combinations of restriction enzyme and genome sequence the fragments per Megabase of the biological sequence deviate at least more then 10% from the corresponding random sequence. This deviation goes into both directions, i.e. clearly increased values are as common as clearly decreased values. Although there is no species- or restriction-enzyme-specific effect, a clear impact of the GC content both of the restriction site and of the genome sequence can be seen. In contrast to the random sequences, the genome sequences show distinct peaks in their fragment length distributions, hinting to repetitive elements such as transposons.

4
Gene family evolutionary dynamics reveal convergent genomic signatures in pancrustacean metamorphosis

Campli, G.; Chipman, A. D.; Waterhouse, R. M.

2026-05-08 evolutionary biology 10.64898/2026.05.06.723392 medRxiv
Top 0.2%
1.5%
Show abstract

Arthropods exhibit an exceptional diversity of life histories, where developmental modes involve moulting stage progressions with changes ranging from the bare minimal to the dramatically transformative. While this variability drives many research questions aiming to understand evolutionary and developmental underpinnings of life history differences, it can complicate comparative analyses across taxa. However, this can be approached by applying a framework that defines metamorphosis as a post-embryonic stage progression characterised by substantial changes in morphology and adaptive landscape. Employing this framework with a phylogenomic dataset spanning 26 orders and encompassing four independently arising metamorphic lineages, we explore gene repertoire evolutionary dynamics potentially associated with metamorphosis in Pancrustacea. The approach contrasts gene family evolutionary dynamics inferred to have occurred in the last common ancestors of the metamorphic Insecta, Copepoda, Eucarida, and Thecostraca, with those of their sister lineages, as well as of descendent and ancestral nodes. The results reveal that the metamorphosis ancestors are characterised by an elevated number of gene family births and expansions. Expanded gene families share a set of commonly enriched biological processes across all metamorphosis ancestors, suggesting functional convergence by independent evolution of distinct gene families involved in embryonic and post-embryonic development and nervous system differentiation. Evolutionary modelling further highlights a subset of these families exhibiting signatures of adaptive, lineage-specific gene family size increases associated with metamorphic development. These families include genes implicated in neural and sensory development, segmentation, and moulting. These findings support a model of the evolution of pancrustacean metamorphosis where distinct gene families from a common functional toolkit expand and are co-opted into facilitating transitions to multi-phasic life cycles. This reframes the role of moulting in arthropod diversification to be recognised as an important reservoir of genetic change that can potentiate truly remarkable life history transitions.

5
Gene model for the ortholog of Lst8 in Drosophila yakuba

Lawson, M. E.; Sanow, K. A.; Chetana, K.; Taylor, E.; Morgan, A.; Flannery, D.; Elsie, C.; Rele, C. P.; Reed, L. K.; O'Rourke, K. S.

2026-05-14 genomics 10.64898/2026.05.12.723325 medRxiv
Top 0.2%
1.2%
Show abstract

Gene model for the ortholog of Lst8 (Lst8) in the May 2011 (WUGSC dyak_caf1/DyakCAF1) Genome Assembly (GenBank Accession: GCA_000005975.1) of Drosophila yakuba. This ortholog was characterized as part of a developing dataset to study the evolution of the Insulin/insulin-like growth factor signaling pathway (IIS) across the genus Drosophila using the Genomics Education Partnership gene annotation protocol for Course-based Undergraduate Research Experiences.

6
Nearest Neighbor Parameters for Estimating the Folding Stability of RNA Including Pseudouridine

Shabangu, T. S.; Kierzek, E.; Arteaga, S.; Orf, G. S.; Stone, J.; Hiltke, O. M.; Miaro, M.; Jolley, E. A.; Soszynska-Jozwiak, M.; Szabat, M.; Aviran, S.; Bevilacqua, P. C.; Znosko, B. M.; Kierzek, R.; Mathews, D. H.

2026-05-17 biochemistry 10.64898/2026.05.16.725682 medRxiv
Top 0.3%
1.1%
Show abstract

Nearest neighbor parameters are widely used in software for estimating the conformational stability of an RNA sequence folding into a specific structure. Folding stability for RNA with canonical nucleotides A, C, G, and U has been widely studied, but the same is not true for most modified nucleotides. In this work, we present a comprehensive set of nearest neighbor parameters for estimating the folding stability of RNA including pseudouridine in helical or loop contexts. These parameters are derived from 210 optical melting experiments involving helices with pseudouridine-A and pseudouridine-G pairs and with pseudouridine in loop motifs. The experiments include sequences with pseudouridine and U in the same strand, including U-A and U-G pairs, allowing us to consider the folding stability of sequences with both U and pseudouridine. On average, pseudouridine stabilizes RNA folding compared to U in an analogous motif, although this effect is sequence-context dependent. These parameters improve the modeling of folding stability for RNA secondary structures containing pseudouridine. We demonstrate that these parameters successfully model the secondary structure change for Saccharomyces cerevisiae U2 snRNA when two additional inducible pseudouridines are present. These parameters are freely available and incorporated into the RNAstructure software package. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=81 SRC="FIGDIR/small/725682v1_ufig1.gif" ALT="Figure 1"> View larger version (14K): org.highwire.dtl.DTLVardef@e1167aorg.highwire.dtl.DTLVardef@18ac7f0org.highwire.dtl.DTLVardef@4c909eorg.highwire.dtl.DTLVardef@aa8bca_HPS_FORMAT_FIGEXP M_FIG C_FIG

7
Mutational and bioinformatic analysis of the binding site for the ribonucleotide reductase-specific transcriptional repressor NrdR

Shahid, S.; Lundin, D.; Rozman Grinberg, I.; Sjöberg, B.-M.

2026-05-14 molecular biology 10.64898/2026.05.11.724285 medRxiv
Top 0.3%
1.0%
Show abstract

The prevalent transcriptional repressor NrdR binds to highly conserved prokaryotic sequences in the promoter regions of operons encoding the essential enzyme ribonucleotide reductase. The NrdR binding sites consist of two partially palindromic 16 bp sequences (NrdR boxes) separated by a 15-16 bp linker sequence. We have assessed the requirement of both boxes for binding, the propensity of different NrdRs to bind to heterologous binding sites, and that the linker sequence is only limited to length and not sequence conservation. As we have observed several deviations from the conserved sequences of the NrdR boxes, we here test the conservation requirements of individual basepairs in the NrdR boxes using a synthetic DNA fragment (Synt DNA) to which the NrdR proteins from the actinomycete Streptomyces coelicolor and the gammaproteobacterium Escherichia coli bind equally well as to their homologous binding sites. By introducing isolated mutations to Synt DNA and testing the binding capacity of NrdR from S. coelicolor and E. coli we expand our understanding of what criteria are needed to build a functional binding site for the NrdR repressor.

8
Resolving the oak tree of life: comparing RADseq and whole genome resequencing methods for oak phylogenetics

Hipp, A. L.; Althaus, K. N.; Fuller, E. L.; Hahn, M.; Larson, D. A.; Mohn, R. A.; Wang, B.; Manos, P. S.

2026-05-17 evolutionary biology 10.64898/2026.05.14.725274 medRxiv
Top 0.3%
0.9%
Show abstract

Forest trees pose numerous potential challenges to phylogenomic inference. Their large effective population sizes and relatively long generation times lead to deep allele coalescence and consequently incomplete lineage sorting (ILS), which biases inferences of divergence times toward older ages and introduces gene tree discordance. Deep phylogenetic divergences, reaching back into the Paleocene, introduce reference-mapping biases. Introgression--the movement of genes between lineages--may result in different phylogenies being inferred depending on which individuals are included in analysis, even if the plurality of the genome favors the divergence history unaffected by introgression. These factors influence phylogenetic inference across the Tree of Life but are particularly prevalent in forest trees. Oaks (Quercus) are notable for all three influences. In addition, our knowledge of the oak phylogeny is currently based strongly on restriction site associated DNA sequencing (RADseq) datasets published over the past decade, which may introduce additional sources of uncertainty. In this chapter, we analyze a 322-species RADseq dataset and genome resequencing data from across the genus to address sources of uncertainty in our understanding of the global oak phylogeny, which we hope will serve as a model for other research groups working on comparable woody plant groups.

9
Evolutionary rate correlations reveal long-term co-evolutionary interactions in Drosophila melanogaster

Dagilis, A. J.; DiAngelis, B.; Lee, S.; Matute, D. R.

2026-05-23 evolutionary biology 10.64898/2026.05.21.726714 medRxiv
Top 0.3%
0.9%
Show abstract

Co-evolution between genes can occur for a variety of reasons, including co-expression of genes, epistatic interactions between them, physical interactions of gene products and many others. Co-evolutionary partners of a gene are therefore of great interest in identifying potential factors that contribute to any phenotype of interest. State-of-the-art approaches to detect these interactions use correlations of evolutionary rates across a broader phylogeny, and so by necessity identify interactions only among genes that are present across long evolutionary time periods. This makes the methods unwieldy when interest lies in a single focal organism in which the genes of interest may have evolved in the recent evolutionary past. Here, we present a new approach to calculating evolutionary rate correlations which focuses on extracting maximum coverage for a single focal species, while retaining signals of co-evolution across large clades. We show how this approach is able to identify potential interactions even in highly studied species and highly studied genes, with a focus on the D. melanogaster sex-determiner, Sxl, using data from 72 species of Dipterans.

10
Origins of eukaryotic metabolism

Santana-Molina, C.; Spang, A.; Snel, B.

2026-05-12 evolutionary biology 10.64898/2026.05.08.723234 medRxiv
Top 0.5%
0.6%
Show abstract

The origin of eukaryotes is a key event in the evolution of cellular life hypothesized to involve a symbiotic integration between a member of the Asgard archaea and the Alphaproteobacteria. Recent work has provided evidence for additional genetic input from other prokaryotes to the eukaryotic proteome yet the extent and sources of these contributions remain debated. Here we aimed to further resolve the prokaryotic origins of eukaryotic genes to inform our understanding of eukaryogenesis. Specifically, we developed a phylogenetic framework to investigate the origins of eukaryotic gene families associated with metabolism and informational processing for comparison. We found that informational processing genes were predominantly derived by archaea whereas eukaryotic metabolism is highly chimeric in its origin. In contrast to previous studies, we report a substantial number of archaeal origins of diverse metabolic enzymes including key metabolic regulators. This highlights an overlooked participation of archaeal metabolism and pinpoints potential metabolic integrations during eukaryogenesis. Apart from the alphaproteobacterial contributions to the eukaryotic metabolism, we found an additional dominant phylogenetic signal of genes potentially derived from Myxococcota, especially for gene families associated with lipid metabolism. By systematically analysing the origins of eukaryotic metabolism, this research offers novel insights into the origin of eukaryotic membranes and refine our current models for the origin of the eukaryotic cell.

11
Phylogenomics, Biogeography, and a New Family-level Classification of Silversides, Rainbowfishes, and Allies (Teleostei: Atheriniformes)

Hughes, L. C.; de Brito, V.; Piller, K.; Kimura, S.; Unmack, P. J.; Arcila, D.; Betancur-R., R.; Bloom, D. D.; Orti, G.

2026-05-07 evolutionary biology 10.64898/2026.05.05.722987 medRxiv
Top 0.5%
0.6%
Show abstract

The order Atheriniformes (silversides, rainbowfishes, and blue-eyes) is a globally distributed group of fishes with frequent evolutionary transitions between marine and freshwater ecosystems. However, understanding the tempo and mode of these transitions has been hampered by poor phylogenetic resolution and limited taxonomic sampling, particularly within the suborder Atherinoidei. We generated a phylogenomic dataset of 1,100 exon loci for 150 species to resolve interfamilial relationships and reconstruct the groups biogeographic history. We were also able to incorporate a large number of existing GenBank sequences, producing a phylogeny with 265 species sampled for at least some genetic data (67% of known species diversity). While the New World suborder Atherinopsidae is well-resolved, we found that the family Atherinidae is polyphyletic across all analyses. We propose a revised classification that restricts Atherinidae to the genus Atherina and recognizes Atherinomoridae and Craterocephalidae as separate families. Our biogeographic inferences using explicit geographic areas suggests more frequent marine-to-freshwater transitions than previously inferred with simplified binary (marine vs. freshwater) coding, and uncover habitat transitions where marine ancestors may have gone extinct. These results highlight how explicit geographic modeling can uncover marine ancestry erased by extinction, providing a robust phylogenetic framework for future evolutionary studies of Atheriniformes.

12
Genetic polymorphisms in a mate choice locus are maintained by balancing selection in a wild medaka population

Fujimoto, S.; Myosho, T.; Kobayashi, H.; Aoyama, H.; Murase, I.; Sumarto, B. K. A.; Yagi, M.; Kunishima, T.; Matsunami, M.; Kimura, R.

2026-05-08 evolutionary biology 10.64898/2026.05.06.723183 medRxiv
Top 0.5%
0.5%
Show abstract

Sexual selection arises from individual differences in reproductive success, which can drive the maintenance of genetic polymorphisms in genes subject to balancing selection by the pleiotropic effects that trade-off between survival and reproduction. However, the extent to which sexual selection maintains genetic polymorphisms in wild populations remains unclear. Here, we explored on genomic signatures of balancing selection and selective sweep in the northern medaka, Oryzias sakaizumii in Japan by performing whole-genome resequencing of wild individuals. In addition, we re-evaluated the population genetic structure and admixture of Oryzias latipes and O. sakaizumii across the Japanese archipelago and detected genomic regions affected by introgression. Regions with signatures of selection from multiple statistics were located on eleven chromosomes. In particular, a region spanning 4.25 to 6.80 Mb on chromosome 18 showed high genetic diversity that could not be explained by sex differentiation or introgression from O. latipes in Eastern Japan. This pattern suggests that balancing selection maintains genetic polymorphisms in O. sakaizumii. Specifically, because a previously reported quantitative trait locus associated with female mating behavior overlaps with this region, we infer that sexual selection contributes to the maintenance of genetic polymorphism at this locus.

13
New chromosome-level haplotyped genome assemblies and annotation for the Japanese Quail (Coturnix Japonica)

Cabau, C.; Degalez, F.; Leroux, S.; Gourichon, D.; Serre, R.-F.; Vernette, C.; Donnadieu, C.; Iampietro, C.; Vandecasteele, C.; Pitel, F.; Klopp, C.

2026-05-14 genomics 10.64898/2026.05.12.724545 medRxiv
Top 0.6%
0.5%
Show abstract

The Japanese quail (Coturnix japonica) is a widely used model organism in developmental biology, genetics, and agriculture. Here, we present new, haplotyped, high-quality genome assemblies of the Japanese quail, generated using a combination of state-of-the-art sequencing technologies, including PacBio HiFi long reads, Oxford Nanopore sequencing, and Hi-C scaffolding. This assembly has a total length of 1.19 Gb, 80% of which is included in chromosomes, and is highly complete (BUSCO score aves_odb10: 97.3). Assembly metrics show a marked improvement in contiguity, with a significantly higher scaffold N50 and a lower number of contigs compared to the reference genome assembly. Remarkably, the assembly extends previously truncated chromosome ends, with 31 telomeres detected. In addition, we merged the existing Ensembl and Refseq annotations and obtained a combined set of 26,102 genes, of which 25,038 genes were successfully mapped on the improved assembly haplotype 1 (Cjap1.hap1). Together, these new genome assemblies and their enriched annotation provide a robust genomic framework for future research. They enhance our ability to investigate developmental processes, genetic and epigenetic inheritance, and host-pathogen interactions. Furthermore, they offer valuable insights for conservation genetics and sustainable breeding programs. This resource represents a critical step forward in leveraging the full potential of the Japanese quail as a model species in both basic and applied research.

14
A Rarefaction Approach to Identify Local Introgression in a Three Population Tree

Smith, T. Q.; Szpiech, Z. A.

2026-05-16 evolutionary biology 10.64898/2026.05.13.724952 medRxiv
Top 0.6%
0.4%
Show abstract

Pattersons D statistic, also known as the ABBA-BABA statistic, is widely used to detect the presence of archaic genome-wide introgression between two non-sister taxa. Requiring only a single lineage from each of four taxa where one taxon acts as an outgroup to determine the ancestral allele, Pattersons D, counts the imbalance between the number of biallelic sites where either the second and third taxa (ABAB site) or the first and third taxa (BABA site). When there is no introgression, these counts are expected to be equal, and a discordance between counts suggests introgression from the third taxon into either the first or second. Pattersons D is limited to the detection of genome-wide introgression and exhibits a high false-positive rate when applied to smaller genomic segments. Here, we present a new method, D STatistic with Allelic Rarefaction (D*), to address these limitations. D* uses multiple lineages and does not require an outgroup to calculate the imbalance between the number of alleles found exclusively in the second and third taxa and the number of alleles found exclusively in the first and third taxa. D* employs a rarefaction technique to correct for unequal sample-size and allows multiallelic sites. We use simulations to show that D* has better precision and recall for detecting introgressed segments of DNA when compared to similar methods under a wide variety of model parameters and in the presence of technical artifacts common to ancient DNA analyses. We conclude with an analysis of Denisovan DNA introgression in modern day Papuans. Precompiled executables, the manual, and source code can be found at https://github.com/TQ-Smith/DSTAR

15
Enzymatic and Biophysical Analysis of two Highly Related Cytochrome P450 Reductases from Artemisia annua Reveals Differences in Their Ligand Interactions and Domain Motions

Mostert, B.; Judd, R.; Makris, T.; Xie, D.

2026-05-17 plant biology 10.64898/2026.05.13.725038 medRxiv
Top 0.7%
0.4%
Show abstract

Artemisinin is an effective antimalarial drug sourced from Artemisia annua, but its low and variable yields require enhancement either semi-synthetically or in-planta to meet the global demand for treatment. Though essential enzymes have been identified in the artemisinin biosynthetic pathway, including an essential Cytochrome P450 monooxygenase (CYP71AV1), there are still many unknowns. Cytochrome P450 reductase 1 (herein, AaCPR1), has been experimentally confirmed as an electron transfer partner for CYP71AV1 in its three step oxygenation of key artemisinin precursors. However, the recent discovery of a highly related CPR, herein AaCPR2, introduces the possibility that another, potentially more catalytically favourable interaction, could exist for CYP71AV1. Therefore, enzyme kinetics and differential scanning fluorimetry (DSF) were used in the characterisation of both AaCPR1 and AaCPR2 to determine the existence and source of their catalytic differences. Tested enzyme activity under cytochrome c and NADPH concentrations revealed that AaCPR1 had lower Km and higher kcat/Km values, while AaCPR2 had higher Vmax and kcat values. This suggests that AaCPR1 is more effective at reducing cytochrome c when substrate conditions are limiting, whereas AaCPR2 is more effective than AaCPR1 at reducing cytochrome c when substrate conditions are saturating. This implies a functional partitioning of the two enzymes on the basis of substrate availability. The DSF results provided deeper insight into the different protein-ligand interactions between the two enzymes. AaCPR2 reached lower maximum melting temperatures across all tested conditions, whereas AaCPR1 had higher maximum melting temperatures. Thus, AaCPR1 exhibits higher thermal stability and has a higher temperature threshold than AaCPR2. This contributes to the notion that the AaCPRs are functionally divergent also on the basis of temperature. The cumulative differences in melting behaviour between the two enzymes led to the hypothesis that AaCPR1 and AaCPR2 exhibit different domain motions that may lead to preferential catalysis for one redox partner over another. This was further supported by the prediction of a highly variable loop region between the two enzymes at the connecting domain just after the flexible hinge. If such loops are highly mobile, as predicted, then the residue differences therein could provide a bio-structural basis for the kinetic and thermal/biophysical differences observed between AaCPR1 and AaCPR2. These data support that AaCPR1 and AaCPR2 possess fundamental biophysical differences despite their high degree of relatedness. Ultimately, these differences suggest differential metabolic functions of the two enzyme in artemisinin biosynthesis and/or other important secondary metabolic processes.

16
COSMIC-Linked Ras Mutations at the Interface Between H-Ras and PI3KγRBD Frequently Generate Affinity Increases as Well as Affinity Decreases

Mead, E. H.; Batz, K. C.; Shih, K.-H.; Fleming, I. R.; Tesdahl, C. D.; Lizardos, L.; Armendariz, J. R.; Hannan, J. P.; Hickey, A. M.; Leyk, A.; Erbse, A. H.; Falke, J. J.

2026-05-06 biochemistry 10.64898/2026.05.01.722339 medRxiv
Top 0.7%
0.4%
Show abstract

The three conventional isoforms of the Ras G-protein (H-, K-, N-Ras) function as molecular on-off switches that regulate a wide array of signaling pathways, including the Ras-PI3K-PIP3-PDK1-AKT pathway that is central to innate immunity and normal cell growth, and is dysregulated in many disease states. Activation of the pathway by Ras requires adequate Ras-PI3K binding affinity. Here we focus on the interface of known structure in the H-Ras:PI3K{gamma} co-complex essential to multiple pathways including directed pseudopod growth in leukocyte chemotaxis. At this interface 10 H-Ras residues, all 100% conserved between the H-, K- and N-Ras isomers, contact the Ras binding domain of PI3K{gamma} (PI3K{gamma}RBD). To investigate the degree to which the native H-Ras:PI3K{gamma}RBD interface is optimized by evolution for maximal binding affinity, 8 interfacial Ras mutations selected from the COSMIC database and the literature were introduced at the contact positions. All 8 Ras mutations were observed to alter the H-Ras:PI3K{gamma}RBD binding affinity, with 4 mutations yielding significant affinity increases and 4 yielding significant affinity decreases. These findings indicate that the native H-Ras:PI3K{gamma}RBD interface provides intermediate, rather than maximal, binding affinity. Such intermediate affinity is consistent with the substantial binding plasticity of the conserved H-, N-, K-Ras effector docking surface, which has evolved to bind a diverse array of effectors. Furthermore, the findings provide evidence that COSMIC-linked mutations at the H-Ras:PI3K{gamma}RBD interface frequently generate affinity increases as well as decreases, with potential implications for molecular mechanisms of disease and for tool development in cell biology.

17
Biophysical and enzymatic comparison of Bacillus safensis and Bacillus subtilis malate dehydrogenase (MDH) enzymes

Zafiropoulo, H. R.; Thomas, J. E.; Cortez, N. R.; Apostol, K.; de Sa, A.; Khosravi, R.; Moore, L.; Berndsen, C. E.; Bibel, B.

2026-05-14 biochemistry 10.64898/2026.05.13.723581 medRxiv
Top 0.7%
0.4%
Show abstract

Species of Bacillus bacteria including Bacillus safensis and Bacillus subtilis are finding increasing uses in biotechnology and bioremediation, thanks in part to their metabolic robustness. Malate dehydrogenase (MDH) is at the heart of central metabolism and thus a better understanding of Bacillus MDH proteins could aid in the optimization of these applications. MDH of Bacillus spp. belong to the lactate dehydrogenase (LDH)-like class of MDHs, otherwise known as the MDH3 class. Despite wide prevalence in nature among prokaryotes and archaea, this typically homotetrameric class is understudied compared to the MDH1 and MDH2 classes found in eukaryotes. We therefore recombinantly expressed and purified MDH proteins from two societally relevant Bacillus spp.-B. safensis and B. subtilis-and characterized them biophysically (via Size Exclusion Chromatography-Small Angle X-ray Scattering (SEC-SAXS) and Differential Scanning Fluorimetry (DSF)) and enzymatically (via spectroscopic activity assays). As expected based on their high sequence identity, the two MDH orthologs had similar properties in most regards, including a tetrameric structure and high susceptibility to substrate inhibition. However, we uncovered differences in conditional thermal stability, in addition to subtle differences in enzymatic activity that offer insight into the workings of LDH-like MDH. Summary statementMalate dehydrogenase (MDH) is a fundamental metabolic enzyme, from microbes to mammals, yet comparably little is known about microbial MDH, especially MDH of the tetrameric MDH3 class. We compare the biophysical and enzymatic properties of two such enzymes from the societally relevant bacterial species Bacillus subtilis and Bacillus safensis, offering useful insight with potential biotechnological implications.

18
The lack of simplicity in sequence-fitness relationships

Crona, K.; Greene, D.

2026-05-05 evolutionary biology 10.64898/2026.04.30.722031 medRxiv
Top 0.7%
0.4%
Show abstract

AO_SCPLOWBSTRACTC_SCPLOWGene interactions play an important role in the development of antimicrobial drug resistance and other evolutionary processes of medical importance. Empirical studies have revealed multiple peaks, inaccessible trajectories, and constraints on mutation order. Higher order epistasis is associated with obstacles in fitness landscapes. However, its significance has been debated in recent years, sometimes through reinterpretations of data from previous publications. We suggest that local higher order interactions may help reconcile these seemingly contradictory findings. Rank order based methods can be informative when other methods fail to detect consequential interactions. In addition to conventional rank order methods, including sign epistasis, we introduce signed bipyramids. A bipyramid interaction compares extreme genotypes against their intermediates, for example a triple mutant and the wild-type against the corresponding single mutants. In general, interactions are signed if they are implied by the rank order alone.

19
Combining amino acid frequency and 1D convolutional neural network embeddings for the identification of protein-protein interactions using a random forest classifier

Sindhi, N. A.; Pawar, N.; Dixson, J.; Garcia, D.

2026-05-18 bioinformatics 10.64898/2026.05.15.725340 medRxiv
Top 0.7%
0.4%
Show abstract

Predicting protein-protein interactions is a fundamental problem in molecular biology. Experimental approaches for identifying protein-protein interactions are time-consuming and labor-intensive, motivating the development of efficient computational alternatives, including machine learning-based methods. However, conventional machine learning methods often rely on manually engineered features that require substantial domain expertise. In this study, we propose a two-stage framework to address these limitations. In the first stage, a one-dimensional convolutional neural network autoencoder is used to automatically learn latent representations from protein sequences. The quality of these features is evaluated through reconstruction error, reflecting how accurately the model reconstructs the original sequence. In the second stage, these learned features are combined with amino acid frequency-based features to form a hybrid feature set for predicting protein-protein interactions. A systematic comparison is performed between models trained on frequency features alone and those using a hybrid representation. The comparison showed that incorporating one-dimensional convolutional neural network-derived latent features improved the models performance of predicting protein-protein interactions. The dataset was split into training, validation, and test sets. Nested cross-validation was employed, with inner loops for hyperparameter tuning and outer loops for model selection. The random forest classifier achieved the best performance, with a mean receiver operating characteristic-area under curve of 0.91 and a test F1-score of 0.87. These results highlight the effectiveness of integrating deep feature learning with ensemble methods for predicting protein-protein interactions and build upon previous work focused on this fundamental problem. Author SummaryProtein-protein interactions are fundamental in all biological processes. However, predicting these interactions is a key problem in molecular biology. Computational approaches have been tested to address this problem. We applied a mix of machine learning and deep learning to gain insight into the qualities of proteins that engage in interaction. First, we trained a deep learning model, which automatically learned the primary sequence and characters related thereto, reducing bias in the actual prediction process. We combined these features, or latent representations, with amino acid frequency features of protein sequences, and called the two together "hybrid features." Then we performed a systematic comparison of amino acid frequency features-only with hybrid features, among four different machine learning classifiers. Our results suggest that the random forest classifier performed best among all four classifiers at predicting interactions between proteins. We propose that this approach could be used to improve efficiency in testing protein-protein interactions at the bench and may have applications to other biologically relevant molecular interactions.

20
Homologous recombination delayed repair in oocytes in the bdelloid rotifer Adineta vaga post radiation

Moris, V. C.; Philippart, A.; Husson, C.; Hallet, B.; Hespeels, B.; Van Doninck, K.

2026-05-05 molecular biology 10.64898/2026.04.30.722046 medRxiv
Top 0.7%
0.3%
Show abstract

Bdelloid rotifers are known to survive desiccation and high doses of ionizing radiation. This extreme resistance is notably due to their capacity to cope with numerous DNA double-strand breaks (DSBs). Genes encoding key components of the non-homologous end joining (NHEJ) DNA repair pathway are strongly upregulated in the bdelloid rotifer Adineta vaga following exposure to ionizing radiation. Considering the notably high doses tolerated by these organisms, their capacity to efficiently restore genome integrity is particularly striking. Although NHEJ is generally regarded as less accurate than homologous recombination (HR), the absence of major genomic rearrangements in the descendants of irradiated rotifers suggests that DNA repair occurs with high fidelity. Terwagne et al. recently reported a delayed repair in germline nuclei, occurring during oocyte development when homologous chromosomes pair, thereby enabling template-based repair through HR. In this study, we established an in situ hybridization approach on A. vaga cryosections to investigate the spatial and temporal expression of key actors involved in NHEJ, HR, and Base excision repair (BER) pathways in somatic and germline tissues. We show that NHEJ (KU80) and BER-related genes (PARPs) as well as A. vaga Ligase E (putatively involved in DNA repair) are expressed early after radiation exposure in the somatic syncytium. In contrast, HR-related genes (Rad51: two paralogs, Rad54), as well as PCNA (involved in DNA replication, NER, BER, HR) are expressed later in maturing oocytes, indicating the activation of a delayed homologous recombination repair pathway in germline nuclei. Nurse cells, which express genes associated with both HR and NHEJ pathways, may rely on both mechanisms for their own DNA repair while also supplying mRNAs to the maturing oocyte. Our results provide new evidence for a differential regulation of DNA DSB repair pathways between soma and germline in bdelloids, with NHEJ predominating in somatic tissues and HR in the germline of A. vaga. O_FIG O_LINKSMALLFIG WIDTH=200 HEIGHT=200 SRC="FIGDIR/small/722046v1_ufig1.gif" ALT="Figure 1"> View larger version (35K): org.highwire.dtl.DTLVardef@3b1f3borg.highwire.dtl.DTLVardef@17f5eb5org.highwire.dtl.DTLVardef@122ef14org.highwire.dtl.DTLVardef@7e4413_HPS_FORMAT_FIGEXP M_FIG O_FLOATNOAbstract Figure:C_FLOATNO Summary of in situ hybridization results: genes coding for actors of NHEJ are expressed in the somatic nuclei and in the nurse nuclei of Adineta vaga individuals 2.5 hours post X-rays radiation, while genes coding for HR actors and PCNA (involved in multiple pathways including DNA replication and DNA repair: NER, BER, MR, HR) are expressed in the nurse nuclei 2.5 hours post radiation, and later in the maturing oocyte during oogenesis and in the laid eggs. Genes coding for actors highly expressed post-radiation, involved in the BER pathway appear to be only expressed in the somatic syncytium 2.5 hours post radiation, as well as the gene coding for the Ligase E, likely involved in DNA repair. C_FIG